On-Demand Query Result Cleaning
نویسنده
چکیده
Incomplete data is ubiquitous. When a user issues a query over incomplete data, the results may contain incomplete data as well. If a user requires high precision query results, or current estimation algorithms fail to make accurate estimates on incomplete data, data collection by humans may instead be used to find values for, or to confirm this incomplete data. We propose an approach that incrementally confirms incomplete data: First, queries on incomplete data are processed by a probabilistic database system. Incomplete data in the query results is represented in a form called candidate questions. Second, we incrementally solicit user feedback to confirm candidate questions. The challenge of this approach is to determine in what order to confirm candidate questions with the user. To solve this, we design a framework for ranking candidate questions for user confirmation using a concept that we call cost of perfect information (CPI). The core component of CPI is a penalty function that is based on entropy. We compare each candidate question’s CPI and choose an optimal candidate question to solicit user feedback. Our approach achieves accurate query results with low confirmation and computation costs. Experiments on a real dataset show that our approach outperforms other strategies.
منابع مشابه
Limiting Result Cardinalities for Multidatabase Queries Using Histograms
Integrating, cleaning and analyzing data from heterogeneous sources is often complicated by the large amounts of data and its physical distribution which can result in poor query response time. One approach to speed up the processing is to reduce the cardinality of results – either by querying only the first tuples or by obtaining a sample for further processing. In this paper we address the pr...
متن کاملKeyword query cleaning
Unlike traditional database queries, keyword queries do not adhere to predefined syntax and are often dirty with irrelevant words from natural languages. This makes accurate and efficient keyword query processing over databases a very challenging task. In this paper, we introduce the problem of query cleaning for keyword search queries in a database context and propose a set of effective and ef...
متن کاملQOCO: A Query Oriented Data Cleaning System with Oracles
As key decisions are often made based on information contained in a database, it is important for the database to be as complete and correct as possible. For this reason, many data cleaning tools have been developed to automatically resolve inconsistencies in databases. However, data cleaning tools provide only best-effort results and usually cannot eradicate all errors that may exist in a data...
متن کاملDiscovering Popular Clicks\' Pattern of Teen Users for Query Recommendation
Search engines are still the most important gates for information search in internet. In this regard, providing the best response in the shortest time possible to the user's request is still desired. Normally, search engines are designed for adults and few policies have been employed considering teen users. Teen users are more biased in clicking the results list than are adult users. This leads...
متن کاملQuery-Driven Approach to Entity Resolution
This paper explores “on-the-fly” data cleaning in the context of a user query. A novel Query-Driven Approach (QDA) is developed that performs a minimal number of cleaning steps that are only necessary to answer a given selection query correctly. The comprehensive empirical evaluation of the proposed approach demonstrates its significant advantage in terms of efficiency over traditional techniqu...
متن کامل